RankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce
نویسندگان
چکیده
We consider the problem of processing K-Nearest Neighbor (KNN) queries over large datasets where the index is jointly maintained by a set of machines in a computing cluster. The proposed RankReduce approach uses locality sensitive hashing (LSH) together with a MapReduce implementation, which by design is a perfect match as the hashing principle of LSH can be smoothly integrated in the mapping phase of MapReduce. The LSH algorithm assigns similar objects to the same fragments in the distributed file system which enables a effective selection of potential candidate neighbors which get then reduced to the set of K-Nearest Neighbors. We address problems arising due to the different characteristics of MapReduce and LSH to achieve an efficient search process on the one hand and high LSH accuracy on the other hand. We discuss several pitfalls and detailed descriptions on how to circumvent these. We evaluate RankReduce using both synthetic data and a dataset obtained from Flickr.com demonstrating the suitability of the approach.
منابع مشابه
Processing All k-Nearest Neighbor Queries in Hadoop
A k-nearest neighbor (kNN) query, which retrieves nearest k points from a database is one of the fundamental query types in spatial databases. An all k-nearest neighbor query (AkNN query), a variation of a kNN query, determines the k-nearest neighbors for each point in the dataset in a query process. In this paper, we propose a method for processing AkNN queries in Hadoop. We decompose the give...
متن کاملRapid AkNN Query Processing for Fast Classification of Multidimensional Data in the Cloud
A k-nearest neighbor (kNN) query determines the k nearest points, using distance metrics, from a specific location. An all k-nearest neighbor (AkNN) query constitutes a variation of a kNN query and retrieves the k nearest points for each point inside a database. Their main usage resonates in spatial databases and they consist the backbone of many location-based applications and not only (i.e. k...
متن کاملEfficient Multidimensional AkNN Query Processing in the Cloud
A k-nearest neighbor (kNN) query determines the k nearest points, using distance metrics, from a given location. An all k-nearest neighbor (AkNN) query constitutes a variation of a kNN query and retrieves the k nearest points for each point inside a database. Their main usage resonates in spatial databases and they consist the backbone of many location-based applications and not only. In this w...
متن کاملGeometric Approaches for Top-k Queries
Top-k processing is a well-studied problem with numerous applications that is becoming increasingly relevant with the growing availability of recommendation systems and decision making software. The objective of this tutorial is twofold. First, we will delve into the geometric aspects of top-k processing. Second, we will cover complementary features to top-k queries, with strong practical relev...
متن کاملkdANN+: A Rapid AkNN Classifier for Big Data
A k-nearest neighbor (kNN) query determines the k nearest points, using distance metrics, from a given location. An all k-nearest neighbor (AkNN) query constitutes a variation of a kNN query and retrieves the k nearest points for each point inside a database. Their main usage resonates in spatial databases and they consist the backbone of many location-based applications and not only. In this w...
متن کامل